Wakayama Prefecture
Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers
Ching, Michelle, Popescu, Ioana, Smith, Nico, Ma, Tianyi, Underwood, William G., Samworth, Richard J.
We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax-optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)
Development of a Compliant Gripper for Safe Robot-Assisted Trouser Dressing-Undressing
Unde, Jayant, Inden, Takumi, Wakayama, Yuki, Colan, Jacinto, Zhu, Yaonan, Aoyama, Tadayoshi, Hasegawa, Yasuhisa
In recent years, many countries, including Japan, have rapidly aging populations, making the preservation of seniors' quality of life a significant concern. For elderly people with impaired physical abilities, support for toileting is one of the most important issues. This paper details the design, development, experimental assessment, and potential application of the gripper system, with a focus on the unique requirements and obstacles involved in aiding elderly or hemiplegic individuals in dressing and undressing trousers. The gripper we propose seeks to find the right balance between compliance and grasping forces, ensuring precise manipulation while maintaining a safe and compliant interaction with the users. The gripper's integration into a custom--built robotic manipulator system provides a comprehensive solution for assisting hemiplegic individuals in their dressing and undressing tasks. Experimental evaluations and comparisons with existing studies demonstrate the gripper's ability to successfully assist in both dressing and dressing of trousers in confined spaces with a high success rate. This research contributes to the advancement of assistive robotics, empowering elderly, and physically impaired individuals to maintain their independence and improve their quality of life.
- Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.41)
- Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.05)
- South America > Uruguay > Montevideo > Montevideo (0.04)
- (8 more...)
Bridging Probabilistic Inference and Behavior Trees: An Interactive Framework for Adaptive Multi-Robot Cooperation
Wang, Chaoran, Sun, Jingyuan, Zhang, Yanhui, Wu, Changju
This paper proposes an Interactive Inference Behavior Tree (IIBT) framework that integrates behavior trees (BTs) with active inference under the free energy principle for distributed multi-robot decision-making. The proposed IIBT node extends conventional BTs with probabilistic reasoning, enabling online joint planning and execution across multiple robots. It remains fully compatible with standard BT architectures, allowing seamless integration into existing multi-robot control systems. Within this framework, multi-robot cooperation is formulated as a free-energy minimization process, where each robot dynamically updates its preference matrix based on perceptual inputs and peer intentions, thereby achieving adaptive coordination in partially observable and dynamic environments. The proposed approach is validated through both simulation and real-world experiments, including a multi-robot maze navigation and a collaborative manipulation task, compared against traditional BTs(https://youtu.be/KX_oT3IDTf4). Experimental results demonstrate that the IIBT framework reduces BT node complexity by over 70%, while maintaining robust, interpretable, and adaptive cooperative behavior under environmental uncertainty.
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
Cai, Pengfei, Song, Yan, Gu, Qing, Jiang, Nan, Song, Haoyu, McLoughlin, Ian
Most existing sound event detection~(SED) algorithms operate under a closed-set assumption, restricting their detection capabilities to predefined classes. While recent efforts have explored language-driven zero-shot SED by exploiting audio-language models, their performance is still far from satisfactory due to the lack of fine-grained alignment and cross-modal feature fusion. In this work, we propose the Detect Any Sound Model (DASM), a query-based framework for open-vocabulary SED guided by multi-modal queries. DASM formulates SED as a frame-level retrieval task, where audio features are matched against query vectors derived from text or audio prompts. To support this formulation, DASM introduces a dual-stream decoder that explicitly decouples event recognition and temporal localization: a cross-modality event decoder performs query-feature fusion and determines the presence of sound events at the clip-level, while a context network models temporal dependencies for frame-level localization. Additionally, an inference-time attention masking strategy is proposed to leverage semantic relations between base and novel classes, substantially enhancing generalization to novel classes. Experiments on the AudioSet Strong dataset demonstrate that DASM effectively balances localization accuracy with generalization to novel classes, outperforming CLAP-based methods in open-vocabulary setting (+ 7.8 PSDS) and the baseline in the closed-set setting (+ 6.9 PSDS). Furthermore, in cross-dataset zero-shot evaluation on DESED, DASM achieves a PSDS1 score of 42.2, even exceeding the supervised CRNN baseline. The project page is available at https://cai525.github.io/Transformer4SED/demo_page/DASM/.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- Asia > China > Anhui Province (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Provable test-time adaptivity and distributional robustness of in-context learning
Ma, Tianyi, Wang, Tengyao, Samworth, Richard J.
We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $π=\sum_{α\in\mathcal{A}} λ_α π_α$, called the pretraining prior, in which each mixture component $π_α$ is a distribution on tasks of a specific difficulty level indexed by $α$. Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $μ$, consisting of tasks of fixed difficulty $β\in\mathcal{A}$, and with potential distribution shift relative to $π_β$, subject to the chi-squared divergence $χ^2(μ,π_β)$ being at most $κ$. In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with random smoothness as well as random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level $β$, uniformly over test distributions $μ$ in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution $μ$, the convergence rate of its expected risk over $μ$ could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- (2 more...)
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning
Wakayama, Tomoya, Suzuki, Taiji
This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
On Prior Distributions for Orthogonal Function Sequences
Sugasawa, Shonosuke, Mochihashi, Daichi
We propose a novel class of prior distributions for sequences of orthogonal functions, which are frequently required in various statistical models such as functional principal component analysis (FPCA). Our approach constructs priors sequentially by imposing adaptive orthogonality constraints through a hierarchical formulation of conditionally normal distributions. The orthogonality is controlled via hyperparameters, allowing for flexible trade-offs between exactness and smoothness, which can be learned from the observed data. We illustrate the properties of the proposed prior and show that it leads to nearly orthogonal posterior estimates. The proposed prior is employed in Bayesian FPCA, providing more interpretable principal functions and efficient low-rank representations. Through simulation studies and analysis of human mobility data in Tokyo, we demonstrate the superior performance of our approach in inducing orthogonality and improving functional component estimation.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.25)
- Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Modeling & Simulation (0.89)
CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods
Inoue, Takuya, Kawabata, Takuya
This study proposes a method that integrates convolutional neural networks (CNNs) with ensemble numerical weather prediction (NWP) models, enabling surface temperature forecasting at lead times beyond the short-range (five-day) forecast period. Owing to limited computational resources, operational medium-range temperature forecasts typically rely on low-resolution NWP models, which are prone to systematic and random errors. To resolve these limitations, the proposed method first reduces systematic errors through CNN-based post-processing (bias correction and spatial super-resolution) on each ensemble member, reconstructing high-resolution temperature fields from low-resolution model outputs. Second, it reduces random errors through ensemble averaging of the CNN-corrected members. This study also investigates whether the sequence of CNN correction and ensemble averaging affects the forecast accuracy. For comparison with the proposed method, we additionally conducted experiments with the CNN trained on ensemble-averaged forecasts. The first approach--CNN correction before ensemble averaging--consistently achieved higher accuracy than the reverse approach. Although based on low-resolution ensemble forecasts, the proposed method notably outperformed the high-resolution deterministic NWP models. These findings indicate that combining CNN-based correction with ensemble averaging effectively reduces both the systematic and random errors in NWP model outputs. The proposed approach is a practical and scalable solution for improving medium-range temperature forecasts, and is particularly valuable at operational centers with limited computational resources.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)
- (3 more...)
Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations
Sitdhipol, Supawich, Sukprasongdee, Waritwong, Chuangsuwanich, Ekapol, Tse, Rina
Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.
- Oceania > Australia (0.04)
- North America > United States > New Mexico > San Juan County (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
- (2 more...)
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
Yuan, Hui-Guan, Zezario, Ryandhimas E., Ahmed, Shafique, Wang, Hsin-Min, Hua, Kai-Lung, Tsao, Yu
Hearing loss simulation models are essential for hearing aid deployment. However, existing models have high computational complexity and latency, which limits real-time applications and lack direct integration with speech processing systems. To address these issues, we propose Neuro-MSBG, a lightweight end-to-end model with a personalized audiogram encoder for effective time-frequency modeling. Experiments show that Neuro-MSBG supports parallel inference and retains the intelligibility and perceptual quality of the original MSBG, with a Spearman's rank correlation coefficient (SRCC) of 0.9247 for Short-Time Objective Intelligibility (STOI) and 0.8671 for Perceptual Evaluation of Speech Quality (PESQ). Neuro-MSBG reduces simulation runtime by a factor of 46 (from 0.970 seconds to 0.021 seconds for a 1 second input), further demonstrating its efficiency and practicality.